About the Data

Data Source

This dataset contains daily and hourly ridership levels on the Washington, DC Capital Bikeshare with weather information and additional context about the date. The dataset was obtained from the UCI machine learning repository.

The dataframes are made up of the following columns:

  • instant: record index (num)
  • dteday: date (chr)
  • season: season (1:spring, 2:summer, 3:fall, 4:winter) (num)
  • yr: year (0: 2011, 1:2012) (num)
  • mnth: month ( 1 to 12) (num)
  • hr: hour (0 to 23) (num)
  • holiday: weather day is holiday or not (num)
  • weekday: day of the week
  • workingday: if day is neither weekend nor holiday is 1, otherwise is 0. (num)
  • weathersit: (num)
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp: Normalized temperature in Celsius. The values are divided to 41 (max) (num)
  • atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max) (num)
  • hum: Normalized humidity. The values are divided to 100 (max) (num)
  • windspeed: Normalized wind speed. The values are divided to 67 (max) (num)
  • casual: count of casual users (num)
  • registered: count of registered users (num)
  • cnt: count of total rental bikes including both casual and registered (num)

Data Cleaning

The dataset had no missing values, so the only cleaning needed was converting the date character to date objects and the categorical variables from numeric to factors.

Visualizations

Weather Influence on Ridership

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Ridership Distributions

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Models

Simple Linear Model

## 
## Call:
## lm(formula = cnt ~ . - cnt - instant - dteday - casual - registered, 
##     data = myData_daily)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3944.7  -348.2    63.8   457.4  2912.7 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1485.84     239.75   6.198 9.77e-10 ***
## season2       884.71     179.49   4.929 1.03e-06 ***
## season3       832.70     213.13   3.907 0.000102 ***
## season4      1575.35     181.00   8.704  < 2e-16 ***
## yr1          2019.74      58.22  34.691  < 2e-16 ***
## mnth2         131.03     143.78   0.911 0.362443    
## mnth3         542.83     165.43   3.281 0.001085 ** 
## mnth4         451.17     247.57   1.822 0.068820 .  
## mnth5         735.51     267.63   2.748 0.006145 ** 
## mnth6         515.40     282.41   1.825 0.068423 .  
## mnth7          30.80     313.82   0.098 0.921854    
## mnth8         444.95     303.17   1.468 0.142639    
## mnth9        1004.17     265.12   3.788 0.000165 ***
## mnth10        519.67     241.55   2.151 0.031787 *  
## mnth11       -116.69     230.78  -0.506 0.613257    
## mnth12        -89.59     182.21  -0.492 0.623098    
## holiday1     -589.70     180.36  -3.270 0.001130 ** 
## weekday1      212.05     109.49   1.937 0.053187 .  
## weekday2      309.53     107.13   2.889 0.003982 ** 
## weekday3      381.36     107.48   3.548 0.000414 ***
## weekday4      386.34     107.53   3.593 0.000350 ***
## weekday5      436.98     107.44   4.067 5.30e-05 ***
## weekday6      440.46     106.56   4.133 4.01e-05 ***
## workingday1       NA         NA      NA       NA    
## weathersit2  -462.54      77.09  -6.000 3.16e-09 ***
## weathersit3 -1965.09     197.05  -9.972  < 2e-16 ***
## temp         2855.01    1398.16   2.042 0.041526 *  
## atemp        1786.16    1462.12   1.222 0.222261    
## hum         -1535.47     292.45  -5.250 2.01e-07 ***
## windspeed   -2823.30     414.55  -6.810 2.09e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 769.2 on 702 degrees of freedom
## Multiple R-squared:  0.8484, Adjusted R-squared:  0.8423 
## F-statistic: 140.3 on 28 and 702 DF,  p-value: < 2.2e-16

Training Multilinear Regression

## Warning in predict.lm(model, newdata = testData): prediction from a
## rank-deficient fit may be misleading

## 
## Call:
## lm(formula = cnt ~ . - instant - casual - registered, data = trainData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3463.5  -359.0    65.5   447.6  2862.0 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1485.47     272.21   5.457 7.31e-08 ***
## season2       822.91     198.17   4.153 3.81e-05 ***
## season3       724.46     232.25   3.119 0.001907 ** 
## season4      1616.73     193.22   8.367 4.80e-16 ***
## yr1          2032.14      64.15  31.678  < 2e-16 ***
## mnth2          27.84     157.94   0.176 0.860154    
## mnth3         536.40     183.25   2.927 0.003560 ** 
## mnth4         568.70     276.37   2.058 0.040081 *  
## mnth5         826.34     299.91   2.755 0.006056 ** 
## mnth6         803.29     317.11   2.533 0.011579 *  
## mnth7         345.25     351.16   0.983 0.325945    
## mnth8         772.56     335.71   2.301 0.021746 *  
## mnth9        1295.03     295.74   4.379 1.43e-05 ***
## mnth10        532.99     265.42   2.008 0.045115 *  
## mnth11       -270.15     251.88  -1.073 0.283931    
## mnth12       -267.99     197.27  -1.359 0.174851    
## holiday1     -598.33     212.00  -2.822 0.004940 ** 
## weekday1      184.39     119.75   1.540 0.124158    
## weekday2      329.50     116.51   2.828 0.004851 ** 
## weekday3      465.04     117.61   3.954 8.68e-05 ***
## weekday4      394.69     118.12   3.341 0.000889 ***
## weekday5      438.63     117.46   3.734 0.000208 ***
## weekday6      484.16     114.73   4.220 2.85e-05 ***
## workingday1       NA         NA      NA       NA    
## weathersit2  -447.30      83.65  -5.347 1.31e-07 ***
## weathersit3 -2172.86     217.93  -9.970  < 2e-16 ***
## temp          553.24    2519.05   0.220 0.826245    
## atemp        3534.70    2709.17   1.305 0.192529    
## hum         -1181.15     316.01  -3.738 0.000205 ***
## windspeed   -2653.50     478.29  -5.548 4.48e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 754 on 555 degrees of freedom
## Multiple R-squared:  0.8526, Adjusted R-squared:  0.8451 
## F-statistic: 114.6 on 28 and 555 DF,  p-value: < 2.2e-16

Lasso Regression

## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.0.5
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.1-3

## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values

## [1] 120.4481
## [1] 646592.6
## (Intercept) (Intercept)     season2     season3     season4         yr1 
## 1665.863991    0.000000  681.410070  511.041832 1159.414262 1898.034466 
##       mnth2       mnth3       mnth4       mnth5       mnth6       mnth7 
##  -85.144376  338.809529  348.916074  585.086078  400.649468   -5.762268 
##       mnth8       mnth9      mnth10 
##  405.070653  968.596860  603.388394

Neural Networks

Predicting total ridership

## Joining with `by = join_by(temp, hum, windspeed, cnt, season1, season2,
## season3, season4, mnth1, mnth2, mnth3, mnth4, mnth5, mnth6, mnth7, mnth8,
## mnth9, mnth10, mnth11, mnth12, holiday0, holiday1, weekday0, weekday1,
## weekday2, weekday3, weekday4, weekday5, weekday6, workingday0, workingday1,
## weathersit1, weathersit2, weathersit3)`

## [1] 0.7073779

Predicting casual ridership percentage

## Joining with `by = join_by(temp, hum, windspeed, casual_percent, season1,
## season2, season3, season4, mnth1, mnth2, mnth3, mnth4, mnth5, mnth6, mnth7,
## mnth8, mnth9, mnth10, mnth11, mnth12, holiday0, holiday1, weekday0, weekday1,
## weekday2, weekday3, weekday4, weekday5, weekday6, workingday0, workingday1,
## weathersit1, weathersit2, weathersit3)`

## [1] 0.82432